Picture for Qi Dou

Qi Dou

for the ALFA study

Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders

Add code
May 30, 2026
Viaarxiv icon

Geometry-Guided Modeling of Foundation Features Enables Generalizable Object Shape Deformation Learning

Add code
May 28, 2026
Viaarxiv icon

SurfSurg6D: Geometry Consistent Dense Correspondence for Textureless Surgical Instrument Pose Estimation

Add code
May 25, 2026
Viaarxiv icon

Saliency-R1: Enforcing Interpretable and Faithful Vision-language Reasoning via Saliency-map Alignment Reward

Add code
Apr 06, 2026
Viaarxiv icon

Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence

Add code
Mar 17, 2026
Viaarxiv icon

Generalized Recognition of Basic Surgical Actions Enables Skill Assessment and Vision-Language-Model-based Surgical Planning

Add code
Mar 13, 2026
Viaarxiv icon

Surg-R1: A Hierarchical Reasoning Foundation Model for Scalable and Interpretable Surgical Decision Support with Multi-Center Clinical Validation

Add code
Mar 12, 2026
Viaarxiv icon

CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video

Add code
Mar 04, 2026
Viaarxiv icon

The Dresden Dataset for 4D Reconstruction of Non-Rigid Abdominal Surgical Scenes

Add code
Mar 03, 2026
Viaarxiv icon

Real-time Monocular 2D and 3D Perception of Endoluminal Scenes for Controlling Flexible Robotic Endoscopic Instruments

Add code
Feb 16, 2026
Viaarxiv icon